GLAST Infrastructure How-to-Fix : HTF Data Catalog Crawler
This page last changed on Jun 26, 2008 by tonyj.
How to Fix the Data Catalog CrawlerOwned by: Tony Johnson The data catalog crawler is used to check the integrity of the data catalog by checking each file shortly after it is registered in the data catalog, and by periodically rechecking files to see that they still exist and have not been changed. In addition the data catalog crawler understands some file formats (some ROOT and Fits files) and can extract additional information from the file and store it as meta-data in the data catalog. SetupThere are currently two crawlers, one for the DEV database and one for the PROD database.
These data crawlers are normally started automatically by cron jobs running on the appropriate hosts. Each crawler writes log files into the work subdirectory of its working directory, in particular:
Checking if the data crawler is running.Currently use http://glastlnx20.slac.stanford.edu:5080/ (soon to be replaced by Nagios). The crawler is monitored by JMX, and will only reply that its status is OK if it has been actively checking for files in the last 90 seconds (i.e. if it has hung for any reason its status should be reported as not OK). The crawler status can also be checked via the Data Catalog Admin page, and logs messages to the Data Catalog message log. Starting and Stopping the crawlerTo start or stop the server you must log on to the appropriate machine as user glast. cd ~glast/datacat/prod/ ./stop and started using cd ~glast/datacat/prod/ ./start |
![]() |
Document generated by Confluence on Jan 21, 2010 11:37 |